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Background: The peroxisomal enzyme 3-ketoacyl- 
coenzyme A thiolase of the yeast Saccharomyces cerevisiae 
is a homodimer with 417 residues per subunit. It is syn- 
thesized in the cytosol and subsequendy imported into 
the peroxisome where it catalyzes the last step of the 
3-oxidation pathway. We have determined the structure 
of this thiolase in order to study the reaction mech- 
anism, quaternary associations and intracellular target- 
ing of thiolases generally, and to understand the 
structural basis of genetic disorders associated with 
human thiolases. 

Results: Here we report the crystal structure of 
unliganded yeast thiolase refined at 2.8 A resolution. 
The enzyme comprises three domains; two compact 
core domains having the same fold and a loop domain. 
Each of the two core domains is folded into a mixed 



five-stranded p-sheet covered on each side by helices and 
the two are assembled into a five-layered apaga 
structure. The central layer is formed by two helices, 
which point with their amino termini towards the active 
site. The loop domain, which is to some extent stabilized 
by interactions with the other .subunit, runs over the ; 
surface of the two core domains, .encircling the active site 
of its own subunit. 

Conclusions: The crystal structure of thiolase shows that 
the active site is a shallow pocket, shaped by highly 
conserved residues. Two conserved cysteines and a 
histidine at the floor of this pocket probably play key 
roles in the reaction mechanism. The two active sites are 
on the same face of the dimer, far from the amino and 
carboxyl termini of both subunits and the disordered 
amino-terminal import signal sequence. 
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Introduction 

Thiolases are widely distributed in nature, being found in 
both prokaryotes and eukaryotes [1,2]. In the latter, they 
occur in three different compartments; the cytosol, the 
mitochondria and the peroxisomes. There are two classes 
of thiolase; those involved in biodegradarion [3-ketoacyl- 
coenzyme A (CoA) thiolase or thiolase I (EC 2.3.1.16), 
which catalyzes the last reaction in the (S-oxidation 
pathway], and those invloved in biosynthetic pathways 
[acetoacetyl-CoA thiolase or thiolase II (EC 2.3.1.9)] 
(see Fig. 1). The biosynthetic and biodegradative 
enzymes exhibit sequence similarities and key active site 
residues are conserved, suggesting that they have similar 
reaction mechanisms, but kinetic differences have been 
observed [3]. However, there may be substantial differ- 
ences between the binding pockets, because 3-ketoacyl- 
CoA thiolases have relatively large substrates consisting of 
a long 3-keto fatty acid molecule covalendy linked to the 
CoA moiety, while the substrate for the acetoacetyl-CoA 
thiolases is smaller because the bulky fatty acid is replaced 
by the acetoaceryl moiery. 
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Fig. 1. The reactions catalyzed by thiolase I (3-ketoacyl-CoA 
thiolase) and thiolase II (acetoacetyl-CoA thiolase). The cleavage 
reaction is thermodynamically favored in both cases. The 
reaction catalyzed by thiolase II is a carbon-carbon bond 
forming reaction, known as a Claisen condensation 145). 

There are several reasons for our interest in the struc- 
ture of thiolases. Firsdy, the details of the reaction mech- 
anism are not fully understood. Early investigations on 
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mitochondrial pig heart thiolase and thiolase 1 

[5-7] suggested che involvement of essential and reactive 
sulfhydryl groups in the reaction mechanism and led to 
the proposal of a two-step mechanism with an acyl-S 
enzyme as an obligatory intermediate. A cysteine residue 
of pig heart acetoacetyl-CoA thiolase was the first to be 
identified as the possible candidate to carry this acyl 
group [4]. This cysteine residue, Cysl25 in Saccharomyces 
cerevisiae thiolase (Fig. 2), corresponds to a cysteine 
which is strictly conserved in all known thiolase 
sequences. Subsequent studies suggested that at least two 
sulfhydryl groups may actually be present near the active 
site with only one of them being involved in the 
formation of an acyl-enzymc intermediate during 
catalysis [6]. From spectroscopic measurements with 
thiolase 1 from porcine heart it was concluded that the 
distance between the two active site cysteines should be 
smaller than 14 A [8]. The importance of cysteine 
residues has been confirmed by recent investigations on 
the bacterial thiolase II of Zoogloca ramigera which 
showed that two cysteines and a histidine are important 
for catalysis [9]. 

Secondly, thiolases have different quaternary structures. 
The reasons for this are completely unknown. Because 
thiolases occur in prokaryotes as well as in three cellular 



compartments in eukaryc^^nalysis of the available 
sequences has been used toWdy evolutionary relation- 
ships [10]. The peroxisomal 3-ketoacyl-CoA thiolases 
always occur as dimers, but in the thiolase family as a 
whole, dimers, tetramers and hexamers [11] are ob- 
served. For example, the mitochondrial porcine thiolase 
I [5] and thiolase II [12] are both homotetramers and 
prokaryotic thiolase is part of a heterotetrameric 
(J-oxidation complex, a2p2, in which a multi-functional 
polypeptide (the o>subunit) is associated with thiolase I 
(the p-subunit) [2]. 

Thirdly, the intracellular targeting signals of peroxisomal 
and mitochondrial thiolases investigated to date differ 
from those of the majority of the matrix proteins of their 

, respective organelles. Mitochondrial thiolase I is synthe-. 

* sized with a non-cleavable amino-terminal import signal. 
Most of the known peroxisomal matrix proteins contain 
their targeting information within their three carboxy- 
terminal amino acids (peroxisomal targeting signal 1, 
PTS1). In contrast, peroxisomal thiolases of rat liver 
[13,14] and 5. cerevisiae [15,16] have been shown to 
possess an amino-terminal targeting signal (PTS2). The 
amino-terminal peptide of the rat enzyme is cleaved off 
after import into the peroxisome, but this has not yet 
been unequivocally demonstrated for the yeast thiolase. 
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Here we report the 2.8 A crystal structure of the dimeric 
peroxisomal thiolase I of S. cerevisiae (hereafter referred to 
as yeast thiolase), a thiolase that has^a broad chain-length 
specificity. The protein fold is described in detail and 
compared with folds of other proteins of known 
p£ structure. The implications of the structure for biochem- 
H ical studies on protein targeting, for the causes of genetic 
|r diseases due to defects in the thiolase gene, and for the 
|: reaction mechanism are discussed. 



Results 

The asymmetric unit of crystallized thiolase contains one 
yeast thiolase dimer. The current structure has been built 
in an averaged 3.1 A multiple isomorphous replacement 
including anomalous scattering (MIRAS) map and 
refined at 2.8 A resolution. The refinement statistics are 
shown in Table 1. The overall shape of the dimer is 
shown schematically in Fig. 3. The two subunits are 
tighdy associated by interactions across the dimer two- 
fold axis. The two active sites are on one side of the 
molecule and the amino and carboxyl termini are on the 
opposite side. The largest cross section of the molecule is 
observed at the active site face, which has dimensions of 
approximately 80 A X 55 A. The thickness, measured 
perpendicular to the active site face, is -50 A (Fig. 3). 
Three domains can be recognized in each subunit — 
two core domains and a loop domain. The electron 
density map clearly defines the complete chain tracing of 
the core domains, but only an incomplete trace can be 
deduced for the loop domain. 
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The thiolase fold 

The three domains of the thiolase subunit are built up 
from four different sequence fragments (Fig. 4). The 
residues at the amino terminus are mobile. The first 
residue of the model of both subunits is Leu28. 
Residues Leu28 to Serl54 fold into the amino-terminal 
core domain (the N-domain). Subsequent residues, 
Met 155 to Val276, are folded into a long loop (the loop 
domain) with some a-helical and (3-strand regions. 
Residues Ser277 to Arg287 complete the N-domain. 
Residues Arg288 to Glu417 fold into the carboxy- 
terminal core domain (the C-domain). According to 
this domain subdivision, the N-domain, the loop 
domain, and the C-domain consist of 138, 122 and 130 
residues, respectively. 

The two core domains have the same topology — a five- 
stranded mixed (3-sheet covered by helices on both sides 
(Fig. 4). This topology is actually very simple with the 
following secondary structure elements observed sequen- 
tially: PaPapapp, referred to as plal(32a203a3P435 
(Fig. 2). The five (i-strands^ form a mixed (3-sheet con- 
sisting of one antiparallel and four parallel (3-strands. The 
topology can be written as +3x,+ lx,-2x,-l f according to 
the nomenclature of Richardson [17]. The three crossover 
connections are three ct-helices; al and ct2 on one side 
of the sheet and a3 on the other. In the C-domain the 
turn between (3-strand 4 and 0-strand 5 (abbreviated to 



Table 1. Refinement statistics. 


Protein atoms 


5201 


Solvent atoms 


0 


Resolution range 


8A-2.8A 


R-factor* (no. of reflections) 


19.8% (16479) 


Free Reactor* (no. of reflections) 


33.4 % (1792) 


Geometric parameters 




Rms deviation from ideal bond lengths 


0.012 A 


Rms deviation from ideal bond angles 


2.04* 


%i ~ %2 imperfection values c 




subunit-1 • 


34.4* 


subunit-2 


'34.5' 


Residues with <j>,\|/ outside allowed region 


11 


Rms AB for all atoms 


5.0 A2 


backbone atoms . . 


3.5 A2 


side-chain atoms 


6.9 A2 


Average B-factor all atoms 


25.1 A2 


main-chain atoms 


23.6 A2 


side-chain atoms 


29.3 A 


aR factor~ £| F OBS~ F CAicl^| F OBSr bTne free R-f actor has been calcu- 
lated from 1792 reflections which were not included in the X-PLOR calcu- 
lations throughout the refinement cThe Xi — X2 imperfection value (44] 
is the rms difference between observed Xi — X2 values and the nearest 
preferred cluster values, as observed in a database of well refined struc- 
tures. 
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Fig. 3. Schematic diagram of the thiolase dimer. In this side 
view, the dimer two-fold axis runs vertically, between the two 
subunits, the amino terminus (Leu28) and the carboxyl terminus 
(Glu417) are at the bottom and the active sites. of both subunits 
are at the top. 



C34 and C35) is just a short turn, consisting of three 
residues. However, in the N-domain, this is the place 
where the loop domain is inserted. In the N-domain, (33 
is well defined and is the edge strand of the sheet in the 
subunit, but in the dimer it is also hydrogen bonded to its 
two-fold related equivalent strand of the second subunit. 
In the C-domain, (33 is located at the surface with just a 
few hydrogen bonds between it and C02. 

The N-domain and C-domain can be superimposed on 
each other via an approximate two-fold axis (by rotating 
through 178° and translating along the two-fold axis by 
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Fig. 4. The secondary structure elements. The broken lines 
indicate the regions which could not be built owing to lack of 
density. The nomenclature of the secondary structure elements is 
the same as in Fig. 2. 

1.0 A) (Fig. 5). Most of the equivalent residues are in the 
(3-strands and a-helices of the two core domains. The 
root mean square (rms) fit for the 73 equivalenced Cot 
atoms is 1.7 A -although the sequence identity is only 
11 %. The domain two-fold axis runs, approximately 
parallel to helix ct3. Assembly of the N- and C-domains 
generates a compactly folded, five-layered structure with 
the ct3 helices in the center layer, sandwiched between 



The three a-helical layers each contain two helices. All 
six helices point in the same direction. The two a3 
helices of the central layer are orientated with their 
amino termini towards the active site. The loop domain 
is also on this side of the molecule (Fig. 6). 

The electron density map does not provide a complete 
chain tracing for the loop domain. In both subunits, 
residues 162-173 at the beginning of this loop (immedi- 
ately after N04), residues 193-200 (in the middle of the 
domain) and residues 255-273 at the end of the loop 
domain (just before N05) are disordered (see Fig. 4). 
Some other loop domain residues in subunit-1 
(248-254) and subunit-2 (174,184,185) also could not be 
built. The structure of the ordered part of the loop 
domain is not the same in both subunits (as shown in 
Fig. 7), which correlates with differences in crystal 
contacts of the two subunits. The loop domain residues 
have higher mean B-factors than other regions of the 
structure (Fig. 8). The loop domain encircles the active 
site (Fig. 9). A substantial fraction of the loop domain 
residues could not be built (40 out of 122), but some 
secondary structure is still present (Fig. 4). After the 
second break in the main chain, the longest helix (La3; 
residues 202-220) of the entire molecule is seen. This 
helix lies at one extremity of the molecule, far from the 
dimer two-fold axis (Fig. 9). The beginning and end of 
the loop domain are close to the active site pocket (Figs 
9 and 10). 

Active site pocket 

The position of the active site pocket can be located 
from the patterns of conserved residues (Fig. 2) and also 
from studies of the reaction mechanism of the homolo- 
gous Z. ramigera thiolase, which have identified two 
cysteines and a histidine as being important for catalysis 
[9]. The equivalent residues in yeast thiolase are Cysl25, 
His375 and Cys403. These .residues form the base of a 
pocket surrounded by the loop domain. Most of the 
active site residues are located in the C-domain (Table 2). 




Fig. 5. Superposition of the N-domain 
(green) and .C-domain (red). The 
numbers refer to the N-domain, except 
for 375 and 403 which indicate the 
positions of the catalytic residues 
His375 and Cys403 in the C-domain. 
The N-domain (residues 28 to 287) is 
interrupted between residues 154 and 
277 by the presence of the loop, 
domain. 



A detailed view of this pocket is shown in Fig. 10. 
Cys403 is'rather buried, whereas Cysl25 and His375 are 
more exposed to solvent. At a resolution of 2.8 A not all 
the atomic positions can be known accurately; therefore 
- the distances given below are approximate, but they do 
provide important information about the active site 
architecture, as visualized in Fig. 10. The side chains of 
Cysl25 and His375 are close together; for example, in 
subunit-1 the distance between S7(Cysl25) and 
N€2(His375) is 4.1 A. Other polar atoms near 
S7(Cysl25).are O(Cys403) and N(Gly405) which are 
4.4 A and 3.6 A away, respectively. The distance from 
N€2(His375) to S7(Cys403) is 7.5 A and the distance 
between S7(Cysl25) and S7(Cys403) is 6.2 A. Polar 
atoms close to S7(Cys403) are 0(Met315), N(Gly316), 
0(Gly316), 0€l(Gln349) and N€2fGln349) which are 
4.1 A, 4.2 A, 4.0 A, 4.2 A and 3.9 A away, respectively. 
As shown in Fig. 10, the side chain of Cys403 is 
between the side chains of Met315 and Gln349. Inter- 
estingly, Met315 and Gln349 are completely conserved 
in 16 (out of 21) thiolase sequences and are replaced by 
phenylalanine and valine respectively in the other five 
known sequences. 

The floor of the active site pocket is defined by the 
residues His375, Cysl25 and Cys403. The center.of the 
active site is chosen as the center of mass of S7(Cysl25), 
S7(Cys403) and Ne2(His375). At least two more 
methionines are close to the active site; Met402 (just 
below the floor of the active site) and Met 155 (near 
His375). Met408 is also near the active site. The S8 
atoms of all five methionines of the ordered part of 
thiolase actually occur within 1 1 A of the active site 
center. In addition, five (out of eight) Sy atoms of 
cysteines are within 13 A of the active site. The relevance 
of this preferred occurrence of methionines and cysteines 
near the active site is not clear. 

There are no charged polar atoms of lysine, arginine, 
glutamate or aspartate residues within 10 A of the active 
site. The nearest atom of this kind is 06l(Glul53), which 
is 12 A away. The GIul53 side chain is actually shielded 
from the active site pocket by Leu377. Solvent accessibil- 
ity studies show that Glul53 has. no solvent accessible 
surface. Its Oel and Oe2 atoms are hydrogen bonded to 
N(Val93) and N(Leu94), and to N(Gly378) which lies 
at the amino terminus, of the active site helix, Ca3. 

A peculiar feature of the active site is its position with 
respect to the two central a3-helices (Figs 9 and 10). 
These two helices point with their amino termini 
towards the active site, suggesting that the helical dipole 
[18] might assist in catalysis. However, none of the three 
catalytic residues is positioned on the helical axis. The 
loops from (S3, leading into helix ct3, contain the 
catalytic residues, Cysl25 in the N-domain and His375 
in the C-domain. This loop is only- four residues long in 
the N-domain, but comprises 14 residues in the 
C-domain. Both loops have a residue with a positive <J>; 
residue Glnl24 in the N-domain (o>,i|j=73 0 ,-132 0 ) and 




Fig. 6. Schematic illustration of the five-layered structure of 
thiolase. The active site is locatde near the amino-terminal ends 
of the green helices of the center layer. The secondary structure 
elements of the loop domain are in lilac. (Figure prepared by 
JPh Zeelen.) 



Table 2. Active site residues. 


Domain Region 


Residues 3 


N Loop N02 to Na2 
N _ Loop Np3 to Na3 
L Loop before La2 
C Loop CPI to Ca1 
C Loop CP2 to Ca2 
C Loop CP3 to Ca3 

C Loop Cp4 to CP5 


Val93 

Arg123 Cln124 Cys125 Ser126 Ser127 

Met186 Thr189 

Met315 Cly316 Pro319 

Asn343 Clu344 Ala345 Phe346 Cln349 

Ala370 His375 Pro376 Leu377 Gly378 

Thr380 G!y381 Cln384 

Ser401 Met402 Cys403 Ile404 Gly405 

Thr406 Cly407 Met408 Cly409 


a All residues with atoms within 9 A of the center of the active site of 
subunit-1 are tabulated. The bold residues have atoms within 5 A of the 
active site center. 



residue Leu377 in the C-domain (<}>,i|;=7 0 ,-76 0 ). Glnl24 
and Leu377 are in equivalent positions with respect to 
the overall fold of the N- and C-domains, as well as with 
respect to the beginning of helix a3 (Fig. 2). The struc- 
tures of these residues are well defined by the electron 
density map, therefore it seems that some strain in the 
folded protein is required at these positions in order to 
achieve proper positioning of the active site residues [19]. 

Subunit-subunit interactions 

The two subunits make numerous interactions including 
hydrophobic contacts, hydrogen bonds, and salt bridges. 
There are 239 atom pairs within a cutoff distance of 4 A. 
As shown in Fig. 11, three layers of interactions can be 
recognized. In the first layer, on the side of the molecule 
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Fig. 7. The Ca-differences between the 
two subunits. The Ca-Ca distance 
(in A) between equivalent Ca atoms of 
subuniM and subunit-2 is plotted on 
the vertical axis against the residue 
number. The largest differences are in 
the loop domain (residues 155 to 276). 




Residue number 



Fig. 8. The average main chain 
(N,Ca,CO) B-factors (A 2 ) of subuniM 
(continuous line) and subunit-2 (discon- 
tinuous line) plotted against residue 
number." J * - 



where the two active site pockets are located (Fig. 3), 
two loops approach and contact each other (Fig. 9). 
These loops are between N02 and Na2. In particular, 
residues 95 to 98, which form a type II 0-turn, interact 
with each other through hydrogen bonds between main 
chain atoms. The helix Nbz2 also interacts with residues 
from the loop domain of the other subunit (Fig. 9). For. 
example, C0(Alal79)' and S7(Cysl82) of the loop 
domain fit into a pocket formed by hydrophobic atoms " 
of the side chains of Argl04, Ala i 05, Leul08 and Alal09 
of Na2 of the other subunit. Most of the interactions are' 
seen in the second layer. In this layer, the N 03 strands of 
subunit- 1 and subunit-2 form an antiparallel pair of 0- 
strands, related by the dimer two-fold axis (Fig. 9). This 
interaction means' that the mixed 0-sheets of the two 
subunits are continuous, forming a 10-stranded mixed 0- 
sheet. The third layer of interactions is between side 
chains at the carboxyl termini of the two active site Na3 
helices (Fig. 11). The charged side chains of the Not3 
residues Aspl34 and Lysl38 point to a polar pocket 



between the subunits. A salt bridge network is observed 
from OSl(Aspl34) to N£(Lysl38) to Oe2(Glu87) 
(within the same subunit) to Nr]2(Argl23) of the other 
subunit. From this analysis it is clear that most of the 
subunit-subunit interactions are mediated by contacts 
between atoms of the two N-domains. : ■ 



Discussion 

-The thiolase subunit is folded as a five-layered structure, 
witrroutermost and innermost layers, consisting of 
a-helices, with 0-sheets in between. At least one other . 
enzyme, inositol monophosphatase, also adopts a five- 
layered a0a0a structure [20]. However, the topology 
of the fold of inositol monophosphatase is completely 
different. In thiolase, two core domains of identical 
topology can -be recognized. The core domains each 
consist of five 0-strands and three a-helices (Fig. 4). 
The loops between the secondary structure elements 
are of different lengths in the N-domain and the 




Fig. 9. The thiolase dimer. The view is 
along the dimer twofold axis, towards 
the two active sites. The amino and 
carboxyl termini are on the side of the 
molecule furthest from the viewer, 
(a) Schematic drawing of the thiolase . 
dimer with subuniM in green (the N- 
domain), yellow (the loop domain), and 
red (the C-domain), Subunit-2 is in blue. 
The dotted yellow lines are the regions 
of the loop domain which could not be 
built. The (5-strands of the green and red 
core domains are numbered. The two 
active site helices, helix Nct3 and Ca3, 
are marked by asterisks near their, 
amino ends. The side chains of the 
active site residues [Cys1 25 (green 
domain), His375 (red domain) and 
Cys403 (red domain)] of both active 
sites are drawn in white, (b) Stereo Ca- 
trace of the thiolase dimer with the 
same color scheme as in (a). The 
residues at the discontinuities of the 
yellow loop domain are labeled. Other 
markers along the Ca-trace are position 
80 (before N(32),' Cys125 (after Np3), 
His375 (after Cp3) and Cys403 (after 
CP4). 



Fig. 10. The active site of subuniM . The 
main chain coloring scheme is the same 
as in Fig. 9. The active site residues 
Cys125, His375 and Cys403 are 
labeled. Met186 and Ala370 are also 
labeled. The displayed side chains are 
within 9 A of the center of the active site 
(Table 2). The active site His375 side 
chain is surrounded by the side chains 
of Ala370, Cln343, Cysl25 and Leu377. 
The Cysl25 side chain is close to the 
side chains of His375, Met186, Gln124 
and Leu377. The Cys403 side chain is' 
near Cln349 and Met315. Distance 
information is provided in the text. The 
two active site helices Na3 and Ca3 
follow Cys125 and His375, respectively. 




C-domain. The catalytic residues are found in or near 
the loops immediately following the 0-strands. The 
topology of the core domain is also present in the first 
domain of phosphoglucomutase, as detected by the 
program DALI 21]. According to the structural 
alignment, shown in Fig. 12, there are 77 equivalent 
residues (only 7 % sequence identity) which super- 
impose with an rms difference between Ca positions of 



3.1 A. Phosphoglucomutase is a four-domain enzyme 
[22] in which domains 2 and 3 have similar topologies 
to domain 1, but lack the first strand. The assembly of 
these domains into the complete structure is very 
different from the thiolase fold. Nevertheless, helix a3 
(between £3 and 34) in the first domain of phospho- 
glucomutase points towards the active site. Also, the 
turn between (34 and (35 is of particular importance, 





Fig. 11. Side view of the dimer (same 
view as Fig. 3). The two subunits are 
shown in pink and blue. The three 
layers at the dimer interface are high- 
lighted in yellow. The active site face is 
uppermost, the amino and carboxyl 
termini (labeled as N and C, respec- 
tively) are visible in the lower part of 
the figure. The amino terminus (Leu28) 
points into the solvent region. At the 
top, the yellow loops in contact are 
between N02 and Na2. The yellow p- 
strands of the middle layer are the Np3 
strands and the dimer interface helices 
of the bottom layer are the carboxy- 
terminal ends of the Na3 active site 
helices. 




Fig. 12. Stereo superposition of the C- 
dpmain of thiolase (thick lines) with the 
amino-terminal domain of phospho- 
glucomutase (thin lines). The thiolase 
domain is labeled at residues Pro297, 
His375 (active site), Cys403 (active 
site), and Clu417. The phosphogluco- 
mutase domain [22] is labeled at 
residues AlalO, Ser116 (active site) and 
Asn134. 



because it contains the active site serine (Serll6). In 
thiolase, the loop from Np3 continuing into Na3 
contains an active site cysteine (Cysl25) and the active 
site histidine, His375, is found in the equivalent loop 
of the C-dbmain. The turn between N04 and N05 
constitutes the loop domain. This is a short turn in the 
C-domain and immediately follows the second active 
site cysteine, Cys403. 

The active site of thiolase has . been identified and its 
floor is defined by Cysl25, His375 and Cys403. The 
gold-cyanide ion, Au(CN) 2 ~> and the gold-chloride ion, 
AuCl 4 ~, bind in this pocket between S7(Cys403) and 
S"y(Cysl25). Not surprisingly, gold-chloride is an 
inhibitor of the thiolase reaction, showing 50 % ..inhibi- 
tion at micromolar concentrations. From the covalent 
modification studies of pig heart thiolase II [4] and 
Z. mmigera thiolase [23] it can be concluded that Cys 125 
of yeast thiolase will be acylated after the first half 
reaction (Fig. 13) has been completed. Further mecha- 
nistic studies with.Z. ramigera thiolase [9] indicate that 
Cys403, and probably also His375, are important 
catalytic residues. The crystal structure of yeast thiolase 
confirms that these residues are in the active site pocket 
and that the two cysteines are within 14 A of each other, 



in agreement with spectroscopic measurements [8). The 
active site pocket is a small surface patch, shaped by 
residues of the two core domains and surrounded by the 
loop domain, all from the same subunit (Fig. 9). The 
substrate, 3-ketoacyl-CoA, is a large molecule. 
Therefore, it seems likely that some residues of the loop 
domain are important for binding the substrate. 
Evidence for this also comes from the observation that 
11 residues of the loop domain are conserved in 21 
thiolase sequences (Fig. 2). Two of these residues, 
Gly266 and Ala270, are in the disordered part of the 
loop domain. Since these residues are conserved in 
sequences of both thiolase I and thiolase II they are 
probably important for the binding of the CoA moiety, 
which is common' to the substrates of both enzymes, 
rather than for the binding of the fatty acid moiety, 
which is only present in the substrate of thiolase I. 

Although the two active site, helices form the central 
layer, these helices are not particularly hydrophobic 
(Fig. 2). A buried arginine (Arg383) is observed near the 
amino terminus of Ca3, which forms a salt bridge with 
Glu341. Arg383 is a conserved residue (Fig. 2), whereas 
Glu341 is either a glutamate or an aspartate in all 
sequences.. Na3 has charged residues at its carboxyl 
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terminus. These residues face a polar pocket between the 
two subunits (Fig. 11) and are involved in a salt bridge" 
network across the dimer interface. 

'^Comparison of the sequence alignment of the seven per- 
oxisomal 3-ketoacyl-CoA thiolases with- the sequence 
alignment of 21 thiolases* deposited in databanks (data not ■ 
shown) shows that the seven peroxisomal thiolases are a 
much closer family (152 identical residues out of 417) 
compared with the complete set (40 identical residues 
out of 417).. These 40 completely conserved residues are 
circled in Fig. 2 and many are near the active site. 

Another common feature of the seven peroxisomal 
sequences is the amino-terminal extension of 30 to 50 * 
residues before the beginning of the first (5-strand (N01), 
while only a few more residues are present after the last 
(3-strand (C(35). As is shown in Figs 3 and 11, the amino 
and carboxyl termini are close together.' and exposed to 
the solvent. Leu28 is the first residue at the amino 
terminus that is visible in the electron density map. The 
observation that the protein in the thiolase crystals is a 
mixture of the complete primary translation product, 
and polypeptides starting with amino acid residues 5 and 
7 is interesting because this thiolase has been shown to 
possess an amino-terminal peroxisomal targeting signal 
(PTS2J7 The occurrence of a mixture of amino termini 
probably results from the massive over-expression (about 
100-fold) of the thiolase from a multicopy vector. 
Immunoelectron microscopy revealed that thiolase was 
present not only in peroxisomes but throughout the cells . 
with the exception of the mitochondria (M Veenhuis 
and W-H Kunau, unpublished data). If thiolase 
undergoes processing upon import into peroxisomes, 
leading to a mature form which lacks the six amino- 
terminal amino acids, this would explain why a major 
fraction of the enzyme, starts with amino acid Ser7. 
However, it. cannot be excluded that cleavage of the 
first six amino acids may have resulted from limited 
proteolysis during protein purification. 

It has been demonstrated that the first 16 amino acid 
residues of thiolase are essential and sufficient for import 
into yeast peroxisomes [15,16]. The fact that the first 27 
residues of the thiolase structure are disordered and 
exposed to the solvent should allow the interaction of 
the targeting signal with its putative receptor even in the 
folded state of the enzyme. On the basis of genetic 
evidence, a possible candidate for the PTS2 receptor is 
the product of the PAS7 gene [24]. Recent data suggest 
that the Pas7 protein is cytosolic and only binds to per- 
oxisomes in the presence of thiolase (M Marzioch, 
R Erdmann, M Veenhuis and W-H Kunau, unpublished 
data). Thus, it now seems important to investigate 
whether the Pas7 protein binds thiolase in a PTS2- 
dependent manner and whether such an interaction 
requires folded or unfolded thiolase. 

Genetic diseases have been found which are associated 
with defects in the genes for human peroxisomal 



O O 

R-C-C-C-C-SCoA + -S-Cysl 25-enzyme ( -^ 



9 9 

R-C-C-S-Cysl 25-enzyme + C-C-SCoA (1 ) 



9 (2) 
R-C-C-S-Cysl 25-enzyme + ~SCoA— » 



9 

R-C-C-SCoA + -S-Cysl 25-enzyme (2) 



Fig. 13. The two half reactions of thiolase I (adapted from 19]). In 
reaction (1), acetyl-CoA is split off from a ketoacyl-CoA 
molecule and a covalent acyl-enzyme intermediate is formed. In 
reaction (2), CoA reacts with the covalent intermediate and an 
acyl-CoA molecule is released. 




Fig. 14. The final averaged and solvent-flattened 3.1 A MIRAS 
map, contoured at 1.2a. Residues 42, 282, 150, 90, 120 are in 
the P-strands of the N-domain of subunit-1 and residue B122 is 
in p-strand Np3 of subunit-2. 



3-ketoacyl-CoA thiolase [25] and for human mitochon- 
drial acetoacetyl-CoA thiolase [26]. In the latter case, 
these genetic defects, which cause severe health 
problems, have been attributed to point mutations. In 
one patient, a Gly— »Arg mutation was found at a 
position equivalent to residue Asnl76 [27]. This position 
is reasonably well conserved in the non-peroxisomal 
thiolases. In another patient, an Ala— »Thr mutation was 
found [28]. The equivalent residue in yeast thiolase is 
Ala370, which is close to the active site His375 (Fig. 10), 
and conserved in all thiolases (Fig. 2). Experimentally, it 
is found that both the Gly— »Arg mutation and the 
Ala— >Thr mutation result in thiolase variants which are 
less stable than wild-type thiolase. From the structure it 
can be seen that the Ala370— »Thr mutation could 
influence the allowed side chain conformation of 
His375, and therefore interfere with the proper func- 
tioning of the thiolase. The consequences of the 
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Glyl76— >Arg mutation are^^^rivial to explain from 
the structure of the dimeric yeast thiolase, because it is a 
surface residue (at the beginning of the loop domain). 
Since this point mutation is observed in a tetrameric 
thiolase, it is possible that this region of the molecule is 
buried at one of the additional protein interfaces of the 
tetramer. However, the quaternary structure of the 
tetramers is unknown. The structure of the yeast thiolase 
dimer, described in this paper, might provide some clues 
as to how the dimers could assemble into tetramers. 



Biological implications 

Thiolases play an essential role in the biodegrada- 
tion of fatty acids in the (1-oxidation pathway, as 
well as in biosynthetic pathways, where they are 
important for the Claisen condensation reaction 
between two molecules of acetyl-coenzyme A 
(acetyl-CoA) to form acetoacetyl-CoA, which is 
subsequently used as a building block for further 
synthesis. The biosynthetic and biodegradative 
thiolases have similar sequences, and use the same 
reaction mechanism. 

The crystal structure of the biodegradative 
3-ketoacyl-CoA thiolase of Saccharomyces cerevisiae 
peroxisomes provides the first description of the 
three-dimensional architecture of this class of 
enzymes. This thiolase is dimeric and is involved 
in the p-oxidation pathway. The extensive interac- 
tions that are observed across the dimer interface 
suggest that it is a very stable dimer. 

The thiolase dimer has one extensive and rather 
flat side, perpendicular to the dimer two-fold 
axis. Both active sites are located on this side of 
the molecule, near the dimer interface, whereas 
the amino and carboxyl termini of both subunits 
are found on the opposite side. A mutation in the 
thiolase gene causing severe health problems, 
such as ketoacidotic attacks at the age of six 
months followed by severe retardation, has been 
mapped to the active site region and the structure 
provides an explanation for the detrimental 
effects of this mutation. The amino-terminal 27 
residues are present in the crystallized protein, 
but are not visible in the electron density map. 
These residues also contain the peroxisomal 
targeting sequence. The crystal structure shows 
that this targeting sequence is disordered in the 
folded thiolase structure. 

In the unliganded structure decribed here, the 
active site is a shallow pocket surrounded by 
residues of the loop domain. Some residues of 
this. loop domain are disordered, so the exact 
shape of the binding pocket is unknown. Some of 
these disordered residues are completely 
conserved in 21 thiolase sequences, suggesting 
that these loop domain residues are important for 



the formation ot^Ke enzyme-ligand complex 
which then must acquire its specific structure by 
an induced fit mechanism. The crystal structure 
shows that the side chains of three conserved 
catalytic residues, Cysl25, His375 and Cys403 
form the base of the active site pocket. The 
special properties of S7(Cysl25) and Sy(Cvs403) 
in the two-step reaction must be related to their 
interactions with neighboring main chain and side 
chain atoms. More mechanistic and structural 
binding studies will be required to elucidate 
further details of the reaction mechanism. 



Materials and methods 

Expression and purification 

In order to facilitate a large-scale purification of thiolase 
(Fox3p) we constructed a thiolase overproducing S: cerevisiae 
strain, AY-FOX3, in the following way: a 2.5 kb genomic 
Sau3A fragment, functionally complementing thiolase- 
deficient mutants of S. cerevisiae (R Erdmann & W-H Kunau, 
unpublished data), was subcloned into the BcH-digested pCS19 
vector [29]. This subcloning introduced a X)w\ site at the .5' 
end (about 900 bp upstream of the Bell site) and a BamHl site 
at the 3' end (about 600 bp downstream of the Bell site) of the 
gene. Subsequent Xhol/BamHl digestion resulted in a 4.0 kb 
fragment which was inserted into a BamHl/Sall cut multi- 
copy vector, YEp352 [30], to form YEp352-FOX3. 
Transformation of the protease-deficient S. cerevisiae strain 
ABYS56 (a gift from D Wolf, Stuttgart,. Germany) with, 
plasmid YEp352-FOX3 yielded the strain AY-FOX3. Growth 
on oleate as the carbon source [31] resulted in a 70-fold to 90- 
fold increase in thiolase activity over wild-type levels. Thiolase 
was purified from 8-10 g of oleate-grown cells (wet weight) 
according to a published procedure [31]. 

Crystallization, data collection and heavy atom . 
derivatives 

The crystals were grown using a microdialysis method [32,33]. 
The protein solution [5 mg ml" 1 in a buffer of 200 mM 
potassium phosphate pH 7.4, 1 mM dithiothreitol (DTT), 
1 mM NaN 3 , 1 mM EDTA] was dialyzed against 1 ml of a 
buffer of 25 mM 3-(N-morpholino)-propanesulfonic acid 
(MOPS) P H 6.5, 1 mM DTT, NaN 3 , EDTA. The space 
group of the crystals is P2 j2j2, with cell dimensions of 
71.78 A, 93.72 A, and 120.45 A and with one dimer per asym- 
metric unit. * 

The crystals grow at room temperature in about one week. 
Mass spectrometry measurements using protein from dissolved 
crystals show that the crystallized yeast thiolase consists of a 
mixture of proteins with molecular -weights varying from 
44.6 kDa to 44.0 kDa. This is in good agreement with the 
amino-terminal amino acid sequence analysis of dissolved 
crystals, which also showed a mixture of fragments, starring at 
positions 2, 5 and 7 with predominant occurrence of the 
sequence SIKD, which corresponds to residues 7 to 10. 
Clearly, the crystallized protein includes all residues after Ser7. 
The complete protein, starting at Ser2 is only present with an 
estimated occupancy of less than 50 %. 

In order to prevent deterioration of crystal quality the DTT 
concentration of the dialysis buffer was raised to 2 mM one 
week after setting up the crystallization experiment. Every 
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month, the dialysis solution was refreshed in order to be sure 
of a high concentration of DTT in its reduced state. Three 
days before data- collection, the DTT concentration was raised 
to 200 mM. A complete native dataset was collected from one 
crystal with a maximum resolution of 2.8 A (Table 3). The 
procedure resulting in the first interpretable MIRAS map has 
been described elsewhere [34]. Most of the calculations were 
done with CCP4 programs [SERC (UK) Collaborative 
Computing Project 4, Daresbury Laboratory UK 1979] 
Briefly, poor MIRAS phases at 3.1 A resolution, with 
<m>-0.47, were calculated from six heavy atom derivatives 
(Table 4). The best heavy atom derivative was gold-chloride, 



Table 3. Statistics of the native dataset 



Total observed reflections (resolution (30 A-2.8 A) 
Total unique reflections (resolution 30 A-2.8 A) 
Merging R-factor* 

Overall completeness (resolution 30 A-2.8 A) 
Overall completeness (resolution 2.9 A-2.8 A) 
Space group 
Cell dimensions (A) 



47231 
18981 
7.9% 
92.0% 
92.6% 
P2 1 2 1 2 1 
71.78, 93.72, 120.45 



a W = S hZi|<l>-q/ £ h Zi|<l> h f 



Table 4. Heavy atom data. 



Maximum R^- Number R^b Phasing power* 

resolution of sites (centric data) , ' centric/acentric 



KAud 4 (1) 

KAud 4 (2) 

KAu(CN) 2 

KAul 4 

K 3 U0 2 F 5 

tNHJ 2 U 2 Q 7 



2.71 A 
3.02 A 
2.71 A 
3.23 A 
4.02 A 
3.10A 



6.8% 
.8.4% 
7.7% 
9.1% 
12.4% 
8.9% 



0.75 
0.81 
0.80 
0.97 
0.86 (to 4 A) 
0.94 



1.2/0.9 
0.8/0.5 
0.8/0.6 
0.6/0.5 

07/0.5 (to 4 A) 
0.5/0.4 



- liAj < i > h -i h . i |/i h r i | < i >4 bRcuffis « rUF^I -FhI/xIf* ± 

F P | for centnc reflections, cphasing power = rms(F„)/rms(D where Fp, Fp^ F„ are the 
native, derivative and heavy atom structure factor amplitudes respectively and E 
the lack of closure. . 



with two heavy atom sites per dimer. The orientation and 
position of the local dimer two-fold axis were determined 
with the GLRF-package [35,36], using the self rotation 
function and translation function options, respectively. 

The MIRAS map was much improved by two-fold averaging 
using program O [37]. A partial model was built in the 
averaged map. A complete model could eventually be buiit in 
a DEMON-averaged map [38], after improving the mask, 
optimizing the local transformation and changing the 
weighting scheme [34]. After trying to refine this structure, it 
appeared that parts of the loop domain were not related by the 
dimer two-fold axis. These regions were omitted from the 
model and another (dual-)mask was generated, allowing for; 
averaging of the map in the regions of the core domains; no 
averaging in the regions of the loop domain, and solvent flat- 
tening in the regions outside the protein. This mask (and the 
associated averaged MIRAS map) was further improved itera- 
tively in the course of subsequent refinement. The averaged 
and solvent-flattened 3.1 A MIRAS map calculated with the 
final mask was used as a reference map throughout the final 
model building sessions and refinement calculations. This map 



is of good quality. Fig. 14 shows the quality of this map near 
the P-sheet of the N-domain of subunit-1. 

Structure refinement and structure analysis 
The structure was refined in an iterative procedure usine 
alternately, X-PLOR calculations [39] and model building into 
electron density maps. The X-PLOR refinement was done 
first at 3.1 A with strict non-crystallographic symmetry con- 
straints, and then at 2.8 A with non-crystallographic symmetry 
restraints. The X-PLOR models were compared with 
SIGMA A weighted 2F Q -F c maps [40] in extensive model 
building sessions, using the O package [41]. Parts of the loop 
domain which had been omitted could be rebuilt in the 
electron density map as the refinement progressed. Some 
regions of the loop domain are well defined, for example the 
long helix La3 and the two P-strands Lpi and LP2. The 
model building in 2F Q -F C maps was guided by positive and 
negative peaks in the corresponding F Q -F c maps and by the 
averaged and solvent-flattened 3.1 A MIRAS reference map. 
In the final model, there are four fragments missing in 
subunit-1, and five fragments in subunit-2, owing to weak or 
"discontinuous density in the corresponding sections of the 
electron density map. 

The final model has an R-factor of 19.8 %, with a free 
R-factor of 33.4 % (Table 1). Water molecules have not been 
included in the refinement at this stage. The complete model 
consists of 5201 protein atoms. The missing residues of 
subunit-1 are 1-27, 162-173, 193-200, 248-274 and residues 
1-27, 162-174, 184-185, 193-200, and 255-273 of subunit- 
2 are not included in the model. The Ramachandran plot of 
the remaining residues shows that there are 1 1 residues which 
are not close to the allowed regions: five in subunit-1 (50, 82, 
. 124, 377 and 395) and six in subunit-2 (50, 124, 177, 178,* 
182, 251 and 377). These residues have higher than average 
main chain B-factors, except for residues 124 and 377 in 
subunit-1, and residues 50, 124, 377 in subunit-2. The 
average B-factors, calculated from all atoms, for the N- 
domains, loop domains and C-domains are 21 A 2 , 33 A 2 and 
25 A 2 , respectively. The N-domains have the lowest average 
B-factor, which correlates with their position in the center of 
the dimer, near the dimer two-fold axis (Fig. 9). The average 
B-factors of subunit-1 and subunit-2 are 28 A 2 and 22 A 2 , 
respectively. For the core domains there are only three signif- 
icant chain breaks at the 1.2a level; they occur at positions 
151 (in NP4 of subunit-1), 145 (between Na3 and NP4 of 
subunit-2) and 335 (between Cal and Cp2 of subunit-2). As 
shown in Table 1, the geometry of the structure is in good 
agreement with ideal geometry. The two gold sites are in the 
two active site pockets, at a distance of 3.1 A from 
S7(Cysl25) and 2.4 A from Sy(Cys403) in subunit-1 and 
3.2 A from Sy(Cysl25) and 2.5 A from S7(Cys403) in 
subunit-2. 

Subunit-1 has been used as the reference subunit for the 
structure analysis. The package O [41] and WHAT IF [42] 
have been used for structure analysis calculations. Figs 9, 10 
and 11 have been drawn by XOBJECTS (MEM Noble, 
Oxford University, unpublished program). The sequence 
alignments used for Fig. 2 were calculated by the PILEUP 
program of the GCG-package [43]. The sequences were taken 
from sequence databanks. 

The coordinates have been deposited in the Brookhaven 
Protein Data Bank (accession code 1PXT). 
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